68 research outputs found

    TamilTB: An Effort Towards Building a Dependency Treebank for Tamil

    Get PDF
    Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our effort in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT 2.0) and consists of 2 levels or layers: (i) morphological layer (m-layer) and (ii) analytical layer (a-layer). For both the layers, we introduce annotation schemes i.e. positional tagging for m-layer and dependency relations (and how dependency structures should be drawn) for a-layers. Finally, we evaluate our corpora in the tagging and parsing task using well known taggers and parsers and discuss some general issues in annotation for Tamil language

    Improvements to Korektor: A Case Study with Native and Non-Native Czech

    Get PDF
    Abstract: We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted from an in-house corpus WebColl. We show two recent improvements: • We built new language models from freely available (shuffled) versions of the Czech National Corpus and show that these perform consistently better on texts produced both by native speakers and nonnative learners of Czech. • We trained new error models on a manually annotated learner corpus and show that they perform better than the standard error model (in error detection) not only for the learners' texts, but also for our standard evaluation data of native Czech. For error correction, the standard error model outperformed non-native models in 2 out of 3 test datasets. We discuss reasons for this not-quite-intuitive improvement. Based on these findings and on an analysis of errors in both native and learners' Czech, we propose directions for further improvements of Korektor

    Foliar application of Ascophyllum nodosum on improvement of photosynthesis, fruit setting percentage, yield and quality of tomato (Solanum lycopersicum L.)

    Get PDF
    In recent days, liquid formulations of brown seaweed extract, Ascophyllum nodosum used as a biostimulant in agriculture. Various studies suggest that A. nodosum enhanced the growth and yield of agriculturally important crops, but still, there is a lack of information about the biostimulation effects on photosynthesis, flowering and fruit setting of tomato. Hence, the present study aimed to know the effect of foliar application of A. nodosum on photosynthesis, flowering, fruit setting, yield and quality of tomato. A biostimulant product, MC Set with A. nodosum extract applied to tomato as a foliar spray at rates of three different concentrations such as 1.0 L ha−1 (MS 1), 2.0 L ha−1 (MS 2), 3.0 L ha−1 (MS 3) for six times during flowering of 2nd (30 Days after transplanting – DAT), 3rd (40 DAT) and 4th (50 DAT) cluster and fruit setting of 2nd (60 DAT), 3rd (70 DAT) and 4th (80 DAT) cluster respectively. The MC Set treatments enhanced the plant photosynthesis, flower number and fruit number per cluster, yield and quality traits of tomato. However, the middle concentration MS 2 showed highest photosynthetic rate, stomatal conductance, SPAD value, flower and fruit in 2nd, 3rd and 4th cluster. It also had better average fruit weight and yield per plant and hectare and enhanced the quality parameters such as total soluble solids, ascorbic acid content, lycopene and total sugars compared to control and other two concentrations of MS Set. Hence, using A. nodosum extract on tomato growth could be a better sustainable crop production method.

    Understanding the molecular basis of plant growth promotional effect of Pseudomonas fluorescens on rice through protein profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Plant Growth Promoting Rhizobacteria (PGPR), <it>Pseudomonas fluorescens </it>strain KH-1 was found to exhibit plant growth promotional activity in rice under both <it>in-vitro </it>and <it>in-vivo </it>conditions. But the mechanism underlying such promotional activity of <it>P. fluorescens </it>is not yet understood clearly. In this study, efforts were made to elucidate the molecular responses of rice plants to <it>P. fluorescens </it>treatment through protein profiling. Two-dimensional polyacrylamide gel electrophoresis strategy was adopted to identify the PGPR responsive proteins and the differentially expressed proteins were analyzed by mass spectrometry.</p> <p>Results</p> <p>Priming of <it>P. fluorescens</it>, 23 different proteins found to be differentially expressed in rice leaf sheaths and MS analysis revealed the differential expression of some important proteins namely putative p23 co-chaperone, Thioredoxin h- rice, Ribulose-bisphosphate carboxylase large chain precursor, Nucleotide diPhosphate kinase, Proteosome sub unit protein and putative glutathione S-transferase protein.</p> <p>Conclusion</p> <p>Functional analyses of the differential proteins were reported to be directly or indirectly involved in growth promotion in plants. Thus, this study confirms the primary role of PGPR strain KH-1 in rice plant growth promotion.</p

    A high-throughput regeneration protocol for recalcitrant tropical Indian maize (Zea mays L) inbreds

    Get PDF
    Immature embryos from five select recalcitrant maize (Zea mays L) inbreds used as explants were evaluated for their ability to form callus, somatic embryos and subsequent regeneration into plants. The embryos were placed on N6 basal media with varying levels of 2,4-D (0.5, 1.0, 1.5, 2.0, and 2.5 mg l-1) and were regenerated on MS me¬dium supplemented with BAP (2 - 10 mg l-1), 2,4-D (0.25 mg l-1) and silver nitrate (0.85 mg l-1). Explants cultured on N6 medium supplemented with 2,4-D (2.0 mg l-1) were associated with the highest frequency of embryogenic calli and that of UMI 29 were highly embryogenic (78.67%). When synergism between dicamba and 2,4-D on Type II callus production in UMI 29 was sought to be investigated using 2,4-D (1 or 2 mg l-1) individually and in combina¬tion with dicamba (3.7 mg l-1) production of Type II callus with the greatest frequency of 83.33% was observed on N6 medium containing 3.7 mg l-1 dicamba + 1 mg l-1 2,4-D. The greatest percentage of shoot induction (82.67%) was observed on MS medium supplemented with BAP (10 mg l-1). Among the five genotypes tested, UMI 29 was associated with the highest percentage of callus initiation, shoot induction and mean number of developed shoots. The protocol described in this study can reliably be used to transform tropical maize inbreds as a routine

    Revisiting Low Resource Status of Indian Languages in Machine Translation

    Full text link
    Indian language machine translation performance is hampered due to the lack of large scale multi-lingual sentence aligned corpora and robust benchmarks. Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems. Our pipeline consists of a baseline NMT system, a retrieval module, and an alignment module that is used to work with publicly available websites such as press releases by the government. The main contribution towards this effort is to obtain an incremental method that uses the above pipeline to iteratively improve the size of the corpus as well as improve each of the components of our system. Through our work, we also evaluate the design choices such as the choice of pivoting language and the effect of iterative incremental increase in corpus size. Our work in addition to providing an automated framework also results in generating a relatively larger corpus as compared to existing corpora that are available for Indian languages. This corpus helps us obtain substantially improved results on the publicly available WAT evaluation benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    TamilTB - An Effort Towards Building a Treebank for Tamil

    No full text
    This talk is aimed at presenting our ongoing effort to build a PDT style dependency treebank for Tamil language. The talk will outline the annotation scheme and annotation at morphological and surface syntax layers. Various issues such as ambiguous structures, NP compounding, coordination phenomena and clitics with respect to the treebank annotation will be discussed. Our ultimate goal in this project is to develop a feature rich parsing framework for Tamil, thus we also present the results we obtained in automatic parsing (rule based & corpus based) using the developed resources. Some problematic issues in Tamil parsing will also be discussed

    Parsing under-resourced languages: Cross-lingual transfer strategies for Indian languages

    Get PDF
    Key to fast adaptation of language technologies for any language hinges on the availability of fundamental tools and resources such as monolingual/parallel corpora, annotated corpora, part-of-speech (POS) taggers, parsers and so on. The languages which lack those fundamental resources are often referred as under-resourced languages. In this thesis, we address the problem of cross-lingual dependency parsing of under-resourced languages. We apply three methodologies to induce dependency structures: (i) projecting dependencies from a resource-rich language to under-resourced languages via parallel corpus word alignment links (ii) parsing under- resourced languages using parsers whose models are trained on treebanks of other languages, and do not look at actual word forms, but only on POS categories. Here we address the problem of incompatibilities in annotation styles between source side parsers and target side evaluation treebanks by harmonizing annotations to a common standard; and finally (iii) we add a new under-resourced scenario in which we use machine translated parallel corpora instead of human translated corpora for projecting dependencies to under-resourced languages. We apply the aforementioned methodologies to five Indian languages (ILs): Hindi, Urdu, Telugu, Bengali and Tamil (in the order of high to low availability of treebank data). To make the evaluation possible for Tamil, we develop a depen dency treebank resource for Tamil from scratch and we use the created data in evaluation and as a source in parsing other ILs. Finally, we list out strategies that can be used to obtain dependency structures for target languages under different resource-poor scenarios
    corecore